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Detailed protocol 


To map reads to repetitive elements, we created a pseudo-genome that only contains the repeat sequences. The used in house scripts are available here. In 
summary the pseudo-genome can be created as follows (in a Linux operating system) 


Required software: 
Make sure the following software are installed and: 


e python version 2.7.3 

e bowtie 1 version 0.12.9 
e bedtools version 2.20.1 
¢ samtools version >=1.6 
e Picard tools >= 2.6.0 

e STAR >=2.5.2b 


Make sure that the Biopython library is installed 


pip install BioPython 


Download required files: 


1. Clone our code repository 
git clone https/github.com/sirusb/2CLike_analysis.git 
2. Go to the Pseudogenome folder 
cd Pseudogenome 
3. Download the mm9 repeats annotation from RepEnrich google-dive here and put in the Pseudogenome folder. 
4. Decompress the downloaded 'mm9_repeatmasker_clean.txt.gz file as follows: 
gunzip mm9_repeatmasker_clean.txt.gz 
5. Create an mm9 fasta file that contains all the chromosomes present in the 'mm9_repeatmasker_clean.txt.gz using the following bash script: 
chroms="cat mm9_repeatmasker_clean.txt | awk '‘{print $5}' | uniq | grep chr’ 
genome_version='mm9' 
## Download the different chromosome .fa files 
for f in $chroms 
do 
echo “Downloading chr${f}.fa.gz" 
wget http://hgdownload.cse.ucsc.edu/goldenPath/${genome_version}/chromosomes/${f}.fa.gz -O ${f}.fa.gz 
zcat ${f}.fa >> ${genome_version}.fa 
done 
## Remove intermediate files 
echo "removing intermediate files" 


rm chr*.fa.gz 


6. Open the file run_buildPseudogenome.sh and edit the path to Picard tools. 
7. Run the run_buildPseudogenome.sh script 
sh run_buildPseudogenome.sh 


8. Once the script finish running it will create the rms_pseudo_out folder that contains the newly created genome and its STAR index. 
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